All Questions
Tagged with model-evaluationscross-validation
26 questions
0votes
0answers
23views
How to use cross validation to select/evaluate model with probability score as the output?
Initially I was evaluating my models using cross_val with out-of-pocket metrics such as precision, recall, f1 score, etc, or with my own metrics defined in ...
0votes
0answers
87views
XGBoost Classifier Evaluation Confusion on New Dataset Despite High Cross-Validation Scores
I have built an XGBoost classifier model with 90 features, trained on a dataset containing 760k samples. I took great care to separate the labels from the features in both the training and testing ...
0votes
3answers
877views
For cross validation should I use training set, or whole dataset?
I'm new to data science and I have a problem understanding what dataset to use when using cross validation for model evaluation. Let's say I have two models: LogisticRegression and ...
0votes
1answer
747views
Can I use GridSearchCV.best_score_ for evaluation of model performance?
Scikit-learn page on Grid Search says: Model selection by evaluating various parameter settings can be seen as a way to use the labeled data to “train” the parameters of the grid. When evaluating the ...
0votes
1answer
116views
How do I know If my regression model is underfitting?
How do we evaluate the performance of a regression model with a certain RMSE given that a domain knowledge performance metric is not present? Maybe MAPE is one way of comparing the performance of my ...
1vote
0answers
306views
Grouped stratified train-val-test split for a multilabel dataset
So this is indeed nontrivial. I was wondering if there is a fast heuristic algorithm for performing grouped stratified dataset split on a multilabel dataset. Stratification is usually performed to ...
1vote
0answers
633views
How does exactly eval_set and RandomizedSearchCV work for LightGBM?
How does RandomizedSearchCV form the validation sets, while I also defined an evaluation set for LGBM? Is it formed from the train set I gave or how does the evaluation set comes into the validation? ...
0votes
0answers
52views
How to evaluate model accuracy at tail of empirical distribution?
I am making a nonlinear regression on stationary dependent variable and I want to precisely forecast extreme values of this variable. So when my model predicts extreme values I want them to be highly ...
0votes
2answers
320views
I am attempting to implement k-folds cross validation in python3. What is the best way to implement this? Is it preferable to use Pandas or Numpy? [closed]
I am attempting to create a script to implement cross validation in data. However, the splits cannot randomly take any records, so the training and testing can be done on equal data splits for each ...
1vote
1answer
725views
n_jobs=-1 or n_jobs=1?
I am confused regarding the n_jobs parameter used in some models and for CV. I know it is used for parallel computing, where it includes the number of processors specified in n_jobs parameter. So if I ...
1vote
0answers
126views
Imbalanced dataset, finding the statistical significance of a Matthews Correlation Coefficient (MCC) in binary classification (what is a good MCC)?
I have a very imbalanced dataset. Thus, I am using MCC to evaluate the performance of various ML algorithms. It appears that literature is entirely lacking in ways to evaluate how good an MCC score is....
0votes
1answer
2kviews
Machine Learning validation data returns 100% accuracy [closed]
I'm Testing a Machine Learning model with validation data returns that return 100% correct answers, is it overfitting or the model works extremely well, do I need to continue training on more data? I'...
0votes
1answer
379views
Difference between validation and prediction
As a follow-up to Validate via predict() or via fit()? I wonder about the difference between validation and prediction. To keep it simple, I will refer to train, <...
1vote
1answer
115views
Validity of cross-validation for model performance estimation
When applying cross-validation for estimating the performance of a predictive model, the reported performance is usually the average performance over all the validation folds. As during this procedure,...
5votes
3answers
7kviews
In k-fold-cross-validation, why do we compute the mean of the metric of each fold
In k-fold-cross-validation, the "correct" scheme seem to compute the metric (say the accuracy) for each fold, and then return the mean as the final metric. Source : https://scikit-learn.org/stable/...